Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm

نویسندگان

  • T. Aswani
  • Nageswara Rao
چکیده

In comparison with hard clustering methods, in which a pattern belongs to a single cluster, fuzzy clustering algorithms allow patterns to belong to all clusters with differing degrees of membership. This is important in domains such as sentence clustering, since a sentence is likely to be related to more than one theme or topic present within a document or set of documents. However, because most sentence similarity measures do not represent sentences in a common metric space, conventional fuzzy clustering approaches based on prototypes or mixtures of Gaussians are generally not applicable to sentence clustering. This paper presents a novel fuzzy clustering algorithm that operates on relational input data; i.e., data in the form of a square matrix of pairwise similarities between data objects. The algorithm uses a graph representation of the data, and operates in an Expectation-Maximization framework in which the graph centrality of an object in the graph is interpreted as likelihood. Results of applying the algorithm to sentence clustering tasks demonstrate that the algorithm is capable of identifying overlapping clusters of semantically related sentences, and that it is therefore of potential use in a variety of text mining tasks. We also include results of applying the algorithm to benchmark data sets in several other domains.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sentence Level Text Clustering using a Hierarchical Fuzzy Relational Clustering Algorithm

Clustering is the process of grouping or aggregating of data items. Sentence clustering mainly used in variety of applications such as classify and categorization of documents, automatic summary generation, organizing the documents, etc. In text processing, sentence clustering plays a vital role this is used in text mining activities. Size of the clusters may change from one cluster to another....

متن کامل

Optimal Sentence Clustering Using An Innovative Hierarchical Fuzzy Clustering Algorithm

The role of data clustering is inevitable in many text processing activities .Many proceedings are going on in this area since it has wider applications. Sentence clustering is a challenging task when compared with other data clustering, because a sentence is able to represent same ideas in different ways. For E.g. some people see a glass as half empty and some others see half full. Due to this...

متن کامل

Clustering Sentence-Level Text Using a Fuzzy Back- Propagation Clustering Algorithm

In comparison with hard clustering methods, in which a pattern belongs to a unique cluster, clustering algorithms with fuzziness allow patterns with differing degrees of membership to belong to all clusters. This is important in domains such as sentence clustering, as a sentence may belong to more than a topic present within a document or set of documents. Since most sentence similarity measure...

متن کامل

Survey on Clustering Algorithm for Sentence Level Text

Clustering is an extensively studied data mining problem in the text domains. The difficulty finds numerous applications in customer segmentation, classification, collaborative filtering, visualization, document organization, and indexing. In text mining, clustering the sentence is one of the processes and used within general text mining tasks. Several clustering methods and algorithms are used...

متن کامل

Proposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms

In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014